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Abstract 

The  floating-point  division  bug  in  Intel’s  Pentium  processor  and  the  overflow  flag  erratum  of  the  FIST  instruction  in  Intel’s 
Pentium  Pro  and  Pentium  II  processor  have  demonstrated  the  importance  and  the  difficulty  of  verifying  floating-point 
arithmetic  circuits.  In  this  paper,  we  present  a  "black  box"  version  of  verification  of  FP  adders.  In  our  approach,  FP  adders 
are  verified  by  an  extended  word-level  SMV  using  reusable  specifications  without  knowing  the  circuit  implementation. 
Word-level  SMV  is  improved  by  using  Multiplicative  Power  HDDs  (*PHDDs),  and  by  incorporating  conditional  symbolic 
simulation  as  well  as  a  short-circuiting  technique.  Based  on  a  case  analysis,  the  adder  specification  is  divided  into  several 
hundred  implementation-independent  sub-specifications.  We  applied  our  system  and  these  specifications  to  verify  the 
IEEE  double  precision  FP  adder  in  the  Aurora  III  Chip  from  the  University  of  Michigan.  Our  system  found  several  design 
errors  in  this  FP  adder.  Each  specification  can  be  checked  in  less  than  5  minutes.  A  variant  of  the  corrected  FP  adder  was 
created  to  illustrate  the  ability  of  our  system  to  handle  different  FP  adder  designs.  For  each  adder,  the  verification  task 
finished  in  2  CPU  hours  on  a  Sun  UltraSPARC-II  server. 
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1  Introduction 


The  floating-point  (FP)  division  bug  [11,  22]  in  Intel’s  Pentium  processor  and  the  overflow  flag  erratum  of  the 
FIST  instruction  (floating-point  to  integer  conversion)  [14]  in  Intel’s  Pentium  Pro  and  Pentium  n  processors 
have  demonstrated  the  importance  and  the  difficulty  of  verifying  floating-point  arithmetic  circuits  and  the  high 
cost  of  an  arithmetic  bug.  FP  adders  are  the  most  common  units  in  floating-point  processors.  Modem  high¬ 
speed  FP  adders  [20, 23]  are  very  complicated,  because  they  require  many  types  of  modules:  a  right  shifter  for 
alignment,  a  left  shifter  for  normalization,  a  leading  zero  anticipator  (LZA),  an  adder  for  mantissas,  a  rounding 
unit,  etc.  Exhaustive  simulation  or  formal  verification  can  be  used  to  ensure  the  correctness  of  FP  adders. 

Most  of  the  IEEE  floating-point  standard  have  been  formalized  by  Carreno  and  Miner  [5]  in  the  HOL  and 
PVS  theorem  provers.  Theorem  provers  have  been  used  to  verify  arithmetic  circuits  [18,21].  However,  theorem 
provers  require  users  to  make  use  of  detailed  circuit  knowledge  and  the  verification  process  for  floating-point 
circuits  is  very  tedious.  Another  drawback  of  theorem  provers  is  that  the  proofs  are  implementation-dependent. 

After  the  famous  Pentium  division  bug  [11],  Intel  researchers  applied  word-level  SMV  [9]  with  Hybrid  De¬ 
cision  Diagrams  (HDDs)  [8]  to  verify  the  functionality  of  the  floating-point  unit  in  one  of  Intel’s  processors  [7]. 
Due  to  the  limitations  of  HDDs,  the  FP  adder  was  partitioned  into  several  sub-circuits  whose  specifications  were 
expressed  in  terms  of  integer  functions.  Each  sub-circuit  was  verified  individually  based  on  assumptions  about 
its  inputs.  The  correctness  of  the  overall  circuit  had  to  be  ascertained  manually  from  the  verified  specifications 
of  the  sub-circuits.  This  partitioning  approach  is  error  prone,  since  mistakes  could  be  introduced  in  any  of  the 
following  steps:  partitioning  the  circuit  and  the  specification,  performing  case  analysis  for  each  sub-circuit,  and 
proving  the  overall  correctness  of  the  circuit,  which  potentially  could  have  used  a  theorem  proven  Moreover, 
their  specifications  are  highly  dependent  on  the  circuit  implementations. 

The  combination  of  model  checking  and  theorem  prover  techniques  was  used  to  verify  a  TF.F.F.  double 
precision  floating-point  multiplier  [1].  The  circuit  was  partitioned  into  several  sub-circuits  which  can  be 
verified  by  model  checking.  The  theorem  prover  handled  the  completeness  of  the  proofs  by  inference  rules  to 
compose  the  verified  specifications.  This  approach  combines  the  strengths  of  both  techniques.  However,  the 
proofs  based  on  this  approach  are  still  implementation-dependent. 

To  the  best  of  our  knowledge,  only  two  types  of  arithmetic  circuits  can  be  verified  by  treating  them  as 
black  boxes  (i.e.,  the  specifications  contain  only  the  inputs  and  outputs).  First,  an  integer  adder  can  be  verified 
by  using  Binary  Decision  Diagrams  (BDDs)  [2].  Second,  Hamaguchi  et  al  [15]  presented  the  verification 
of  integer  multipliers  without  knowing  their  implementations  using  Multiplicative  Binary  Moment  Diagrams 
(*BMDs)  [4].  However,  their  approach  does  not  work  for  incorrect  designs,  because  the  *BMDs  explode  in 
size  and  counterexamples  can  not  be  generated  for  debugging.  None  of  the  previous  approaches  can  verify  FP 
adders  without  knowing  their  circuit  implementations. 

In  this  paper,  we  present  a  black  box  version  of  verification  of  FP  adders.  In  our  approach,  a  FP  adder  is 
treated  as  a  black  box  and  is  verified  by  an  extended  version  of  word-level  SMV  with  reusable  specifications. 
Word-level  SMV  is  improved  by  using  Multiplicative  Power  HDDs  (*PHDDs)  [6]  to  represent  the  FP  func¬ 
tions,  and  by  incorporating  conditional  symbolic  simulation  as  well  as  a  short-circuiting  technique.  The  FP 
adder  specification  is  divided  into  several  hundred  sub-specifications  based  on  the  sign  bits  and  the  exponent 
differences.  These  sub-specifications  are  implementation-independent,  since  they  use  only  the  input  and  output 
signals  of  FP  adders. 

The  concept  of  conditional  symbolic  simulation  is  to  perform  the  symbolic  simulation  of  the  circuit  with  some 
conditions  to  restrict  the  behavior  of  the  circuit.  This  approach  can  be  viewed  as  dynamically  extracting  circuit 
behavior  under  the  given  conditions  without  modifying  the  actual  circuit.  Can  we  verify  the  specifications  of 
FP  adders  using  conditional  symbolic  simulation,  avoiding  any  use  of  circuit  knowledge?  We  identify  a  conflict 
in  variable  orderings  between  the  mantissa  comparator  and  mantissa  adder,  which  causes  the  BDD  explosion 


in  conditional  symbolic  simulation.  A  short-circuiting  technique  to  overcome  this  ordering  conflict  problem  is 
presented  and  integrated  into  word-level  SMV  package.  In  general,  this  short-circuiting  technique  can  be  used 
when  different  parts  of  the  circuit  are  used  under  different  operating  conditions. 

We  used  our  system  and  these  specifications  to  verify  the  FP  adder  in  the  Aurora  HI  Chip  [16]  at  the 
University  of  Michigan.  This  FP  adder  is  based  on  the  design  described  in  [20],  and  supports  IEEE  double 
precision  and  all  4  TF.F.F  rounding  modes.  In  this  verification  work,  we  verified  the  FP  adder  only  in  the 
round-to-nearest  mode,  because  we  believe  that  this  is  the  most  challenging  rounding  mode  for  verification. 
Our  system  found  several  design  errors.  Each  specification  can  be  checked  in  less  than  3  minutes  or  5  minutes 
including  counterexample  generation.  A  variant  of  the  corrected  FP  adder  was  created  and  verified  to  illustrate 
the  ability  of  our  system  to  handle  different  FP  adder  designs.  For  each  FP  adder,  verification  took  2  CPU 
hours.  We  believe  that  our  system  and  specifications  can  be  applied  to  directly  verify  other  FP  adder  designs 
and  to  help  find  design  errors. 

The  overflow  flag  erratum  of  the  FIST  instmction  (FP  to  integer  conversion)  [14]  in  Intel’s  Pentium  Pro  and 
Pentium  n  processors  has  illustrated  the  importance  of  verification  of  the  conversion  circuits  which  convert  the 
data  from  one  format  to  another  format  (e.g.,  IEEE  single  precision  to  double  precision).  Since  these  circuits 
are  much  simpler  than  FP  adders  and  only  have  one  input  operand,  we  believe  that  our  system  can  be  used  to 
verify  the  correctness  of  these  circuits. 

2  *PHDD  Preliminary 

For  expressing  functions  from  Boolean  variables  to  integer  values,  BMDs[4]  use  the  moment  decomposition 
of  a  function: 

/  =  (I  -  X)  ■  +  X  ■  fo; 

=  fx  +  X  -  {f^-  h) 

=  fx  +  X  ■  fsx  (1) 

where  •,  -|-  and  -  denote  multiplication,  addition  and  subtraction,  respectively.  Term  (/x)  denotes  the 
positive  (negative)  cofactor  of  /  with  respect  to  variable  x,  i.e.,  the  function  resulting  when  the  constant  1  (0) 
is  substituted  for  x .  By  rearranging  the  terms,  we  obtained  the  third  line  of  Equation  1 .  Here,  fsx  =  fx  —  fx  is 
called  the  linear  moment  of  /  with  respect  to  x.  This  terminology  arises  by  viewing  f  3S  a  linear  function  with 
respect  to  its  variables,  and  thus  fsx  is  the  partial  derivative  of  /  with  respect  to  x.  The  negative  cofactor  fy  will 
be  termed  the  constant  moment,  i.e.,  it  denotes  the  portion  of  function  /  that  remains  constant  with  respect  to  x . 
This  decomposition  is  also  called  positive  Davio  in  K*BMDs  [13].  Each  vertex  of  a  BMD  describes  a  function 
in  terms  of  its  moment  decomposition  with  respect  to  the  variable  labeling  the  vertex.  The  two  outgoing  arcs 
denote  the  constant  and  linear  moments  of  the  function  with  respect  to  the  variable. 

Clarke,  et  al.  [8]  extended  BMDs  to  a  form  they  call  Hybrid  Decision  Diagrams  (HDDs),  where  a  function 
may  be  decomposed  with  respect  to  each  variable  in  one  of  six  decomposition  types.  In  our  experience  with 
HDDs,  we  found  that  three  of  their  six  decomposition  types  are  useful  in  the  verification  of  arithmetic  circuits. 
These  three  decomposition  types  are  Shannon,  Positive  Davio,  and  Negative  Davio.  Therefore,  Equation  1  is 
generalized  to  the  following  three  equations  according  the  variable’s  decomposition  type: 

-  x)  ■  fx  +  X  •  fx  {Shannon) 

f  =  <1  fx  +  x-fsx  {Positive  Davio)  (2) 

fx  +  {^~x)-  fsx  {Negative  Davio) 
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Here,  fsx  =  fx-fx  is  the  partial  derivative  of  /  with  respect  to  x.  The  BMD  representation  is  a  subset  of 
HDDs.  In  other  words,  the  HDD  graph  is  the  same  as  the  BMD  graph,  if  all  of  the  variables  use  positive  Davio 
decomposition. 

Chen  and  Bryant  [6]  introduced  multiplicative  power-of-constant  edge  weights  into  HDDs  to  form  a  new 
representation  called  Multiplicative  Power  HDDs  (*PHDDs).  *PHDDs  use  three  of  HDD’s  six  decompositions 
as  expressed  in  Equation  2.  Unlike  the  edge  weights  in  *BMDs,  the  edge  weights  of  *PHDDs  are  powers  of  a 
constant  c.  Thus,  Equation  2  is  rewritten  as: 


i^J) 


<  ■  i  fx  +  X-  fSx) 

^  ■  {fx  +  (1  -  a:)  •  fsx) 


(Shannon) 
(Positive  Davio) 
(Negative  Davio) 


where  (w,  /)  denotes  c'^  x  /.  In  general,  the  constant  c  can  be  any  positive  integer.  Since  the  base  value  of 
the  exponent  in  the  IEEE  floating-point  (FP)  format  is  2,  we  will  consider  only  c  =  2  for  the  remainder  of  this 
paper.  Observe  that  w  can  be  negative,  allowing  representation  of  rational  numbers.  The  power  edge  weights 
enable  us  to  represent  functions  mapping  Boolean  variables  to  FP  values  without  using  rational  numbers  in  our 
representation. 
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Figure  1:  An  integer  function  of  Boolean  variables,  /  =  l  +  j/  +  3a;  +  3xy,  is  represented  by  (a) 
Truth  table,  (b)  BMDs,  (c)  *BMDs,  (d)  HDDs  with  Shannon  decompositions,  (e)  "‘PHDDs  with  Shannon 
decompositions.  The  dashed-edges  are  0-branches  and  the  solid-edges  are  the  1 -branches.  The  variables  with 
Shannon  and  positive  Davio  decomposition  types  are  drawn  in  the  thin  and  thick  vertices,  respectively.  The 
number  i  in  the  thin  and  thick  boxes  represents  i  and  2®,  respectively. 

Figure  1  shows  an  integer  function  /  with  Boolean  variables  x  and  y  represented  by  a  truth  table,  BMDs, 
*BMDs,  and  HDDs  with  Shannon  decompositions  (also  called  MTBDD  [10]).  In  our  drawing,  the  variables 
with  Shannon  and  positive  Davio  decomposition  types  are  drawn  in  thin  and  thick  vertices,  respectively.  The 
number  i  in  the  thin  and  thick  boxes  represents  i  and  2®,  respectively.  A  dashed  line  from  a  vertex  with  variable 
X  points  to  the  vertex  represented  function  /^,  fy,  or  fx  for  the  Shannon,  positive  Davio  or  negative  Davio 
decompositions,  respectively.  Similarly,  a  sobd  line  from  a  vertex  with  variable  x  points  to  the  vertex  represented 
function  fx,  fsx  or  fsx  for  the  Shannon,  positive  Davio  or  negative  Davio  decompositions.  Figure  l.b  shows 
the  BMD  representation.  To  construct  this  graph,  we  apply  Equation  1  to  function  /  recursively.  First,  with 
respect  to  variable  x,  we  get  fx  =  ^  +y,  represented  as  the  graph  attached  to  the  dashed-edge  from  vertex  x, 
and  fsx  =  'i-\-  3y,  represented  by  the  solid  branch  from  vertex  x.  Observe  that  fsx  can  be  expressed  by  3  x  fx- 
By  extracting  the  factor  3  from  fsx,  the  graph  becomes  Figure  l.c.  This  graph  is  called  a  Multiplicative  BMD 
(*BMD),  in  which  the  greatest  common  divisor  (GCD)  is  extracted  from  both  branches.  The  edge  weights 
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combine  multiplicatively.  The  HDD  with  Shannon  decompositions  can  be  constructed  from  the  truth  table. 
The  dashed  branch  of  vertex  x  (fx)  is  constmcted  from  the  first  two  entries  of  the  table,  and  the  solid  branch 
of  vertex  x  (i.e.,  fx)  is  constructed  from  the  last  two  entries  of  the  table.  Observe  that  fx  is  equal  to  2^  x  fx- 
The  *PHDDs  can  be  constructed  by  extracting  the  powers-of-2  weights  from  the  HDDs  recursively. 

Observe  that  if  variables  x  and  y  are  viewed  as  bits  forming  a  2-bit  binary  number,  X=y+2x,  then  the 
function  /  can  be  rewritten  as  /  =  =  2^ .  Observe  that  HDDs  with  Shannon  decompositions  and 

BMDs  grow  exponentially  for  this  type  of  function.  *BMDs  can  represent  them  efficiently  due  to  edge  weights. 
However,  *BMDs  and  HDDs  cannot  represent  the  functions  as  /  =  2^~^,  where  5  is  a  constant,  because 
they  can  only  represent  integer  functions.  *PHDDs  can  represent  this  type  of  functions  efficiently  by  adding 
the  edge  weight  of  on  the  top  the  graph  of  2^.  Therefore,  *PHDDs  can  represent  the  FP  encoding  and 
operations  efficiently.  Readers  can  refer  to  [6]  for  more  details  of  FP  representation  using  *PHDDs. 

3  Floating-Point  Adders 

Let  us  consider  the  representation  of  FP  numbers  by  IEEE  standard  754.  Double-precision  FP  numbers  are 
stored  in  64  bits:  1  bit  for  the  sign  (S^),  11  bits  for  the  exponent  (Ex),  and  52  bits  for  the  mantissa  (Nx).  The 
exponent  is  a  signed  number  represented  with  a  bias  (B)  of  1023.  The  mantissa  (Nx)  represents  a  number  less 
than  1 .  Based  on  the  value  of  the  exponent,  the  IEEE  FP  format  can  be  divided  into  four  cases: 

(_l)5i  X  l.Nx  X  If  0  <  Ex  <  All  1  (normal) 

(_1)S,;  X  O.Nx  X  If  Ex  =  0  (denormal) 

'  NaN  If  Ex^Alll&Nx¥^0 

X  oo  If  Ex  ^  All  l&Nx=0 

where  NaN  denotes  Not-a-Number  and  oo  represents  infinity.  Let  Mx  =  l-Nx  or  O.Nx.  Let  m  be  the  number 
of  mantissa  bits  including  the  bit  on  the  left  of  the  binary  point  and  n  be  number  of  exponent  bits.  For  IEEE 
double  precision,  m=53  and  n=\  1 . 

Due  to  this  encoding,  an  operation  on  two  FP  numbers  cannot  be  rewritten  as  an  arithmetic  function  of 
two  inputs.  For  example,  the  addition  of  two  FP  numbers  X  (Sx,  Ex,  Mx)  and  Y  (Sy,  Ey,  My)  can  not  be 
expressed  as  X  +  Y,  because  of  special  cases  when  one  of  them  is  NaN  or  ±oo.  Table  1  summarizes  the 
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Table  1 :  Summary  of  the  FP  addition  of  two  numbers  of  X  and  Y.  F  represents  the  normal  and  denormal 
numbers.  *  indicates  FP  invalid  arithmetic  operands. 

possible  results  of  the  FP  addition  of  two  numbers  X  and  Y,  where  F  represents  a  normalized  or  denormalized 
number.  The  result  can  be  expressed  as  Round(X  -|-  Y)  only  when  both  operands  have  normal  or  denormal 
values.  Otherwise,  the  result  is  determined  by  the  case.  When  one  operand  is  -|-oo  and  the  other  is  — oo,  the  FP 
adder  should  raise  the  FP  invalid  arithmetic  operand  exception. 

Figure  2.a  shows  the  block  diagram  of  the  SNAP  FP  adder  designed  at  Stanford  University  [20].  This 
adder  was  designed  for  fast  operation  based  on  the  following  facts.  First,  the  alignment  (right  shift)  and 
normalization  (left  shift)  needed  for  addition  are  mutually  exclusive.  When  a  massive  right  shift  is  performed 
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during  alignment,  the  massive  left  shift  is  not  needed.  On  the  other  hand,  the  massive  left  shift  is  required  only 
when  the  mantissa  adder  performs  subtraction  and  the  absolute  value  of  exponent  difference  is  less  than  2  (i.e. 
no  massive  right  shift).  Second,  the  rounding  can  be  performed  by  having  the  mantissa  adder  generate  A  +  C, 
A  +  C  +  1  and  A  +  C  +  2,  where  A  and  C  are  the  inputs  of  the  mantissa  adder,  and  using  the  final  multiplexor 
to  chose  the  correct  output. 


(a) 


(b) 


Figure  2:  The  Stanford  FP  adder  (a)  and  a  variant  (b). 

In  the  exponent  path,  the  exponent  subtracter  computes  the  difference  of  the  exponents.  The  MuxAbs  unit 
computes  the  absolute  value  of  the  difference  for  alignment.  The  larger  exponent  is  selected  as  the  input  to  the 
exponent  adjust  adder.  During  normalization,  the  mantissa  may  need  a  right  shift,  no  shift  or  a  massive  left 
shift.  The  exponent  adjust  adder  is  prepared  to  handle  all  of  these  cases. 

In  the  mantissa  path,  the  operands  are  swapped  as  needed  depending  on  the  result  of  the  exponent  subtracter. 
The  inputs  to  the  mantissa  adder  are:  the  mantissa  with  larger  exponent  (A)  and  one  of  the  three  versions  of  the 
mantissa  with  small  exponent  (C);  unshifted,  right  shifted  by  1,  and  right  shifted  by  many  bits.  The  path  select 
unit  chooses  the  correct  version  of  C  based  on  the  value  of  exponent  difference.  The  version  right  shifted  by 
many  bits  is  provided  by  the  right  shifter,  which  also  computes  the  information  needed  for  the  sticky  bit.  The 
mantissa  adder  performs  the  addition  or  subtraction  of  its  two  inputs  depending  on  the  signs  of  both  operands 
and  the  operation  (add  or  subtract).  If  the  adder  performs  subtraction,  the  mantissa  with  smaller  exponent  will 
first  be  complemented.  The  adder  generates  all  possible  outcomes  ( A  +  C,  A  +  C  +  1 ,  and  A  +  C  +  2)  needed 
to  obtain  the  final,  normalized  and  rounded  result.  The  A  +  C  +  2  is  required,  because  of  the  possible  right  shift 
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during  normalization.  For  example,  when  the  most  significant  bits  of  A  and  C  are  I,  A  AC  will  have  m  + 1  bits 
and  must  be  right  shifted  by  1  bit.  If  the  rounding  logic  decides  to  increase  1  in  the  least  significant  bit  of  the 
right  shifted  result,  it  means  add  2  into  A  +  C.  When  the  operands  have  the  same  exponent  and  the  operation  of 
the  mantissa  adder  is  subtraction,  the  outputs  of  the  adder  could  be  negative.  The  ones  complementer  is  used  to 
adjust  them  to  be  positive.  Then,  one  of  these  outputs  is  selected  by  the  GRS  unit  to  account  for  rounding.  The 
GRS  unit  also  computes  the  true  guard  (G),  round(i?),  sticky  (5)  bits  and  the  bit  to  be  left  shifted  into  the  result 
during  normalization.  When  the  operands  are  close  (the  exponent  difference  is  0, 1,  or  -1)  and  the  operation  of 
the  mantissa  adder  is  subtraction,  the  result  may  need  a  massive  left  shift  for  normalization.  The  amount  of  left 
shift  is  predicted  by  the  leading  zero  anticipator  (LZ4)  unit  in  parallel  with  the  mantissa  adder.  The  predicted 
amount  may  differ  by  one  from  the  correct  amount,  but  this  1  bit  shift  is  made  up  by  a  l-bit^ne  adjust  unit. 
Finally,  one  of  the  four  possible  results  is  selected  to  yield  the  final,  rounded,  and  normalized  result  based  on 
the  outputs  of  the  path  select  and  GRS  units. 


Figure  3:  Detail  circuit  of  the  compare  unit 

As  an  alternative  to  the  SNAP  design,  the  ones  complementer  after  the  mantissa  adder  can  be  avoided,  if  we 
ensure  that  input  C  of  the  mantissa  adder  is  smaller  than  or  equal  to  input  A,  when  the  exponent  difference 
is  0  and  the  operation  of  mantissa  adder  is  subtraction.  To  ensure  this  property,  a  mantissa  comparator  and 
extra  circuits,  as  shown  in  [23],  are  needed  to  swap  the  mantissas  correctly.  Figure  2.b  shows  a  variant  of 
the  SNAP  FP  adder  with  this  modification  (the  compare  unit  is  added  and  the  ones  complementer  is  deleted). 
This  compare  unit  exists  in  many  modem  high-speed  FP  adder  designs  [23]  and  makes  the  verification  harder 
described  in  Section  5.4.  Figure  3  shows  the  detailed  circuit  of  the  compare  unit  which  generates  the  signal  to 
swap  the  mantissas.  The  signal  <  Ey  comes  fi-om  the  exponent  subtractor.  When  Ex  <  Ey  or  Ex  —  Ey- 
and  Mx  <  My  (i.e.,  h  =1),  A  is  My  (i.e.  the  mantissas  are  swapped).  Otherwise,  A  is  Mx. 


4  Specifications  of  FP  Adders 


In  this  section,  we  focus  on  the  general  specifications  of  the  FP  adder,  especially  when  both  operands  have 
denormal  or  normal  values.  For  the  cases  in  which  at  least  one  of  operands  is  a  NaN  or  oo,  the  specifications 
can  be  easily  written  at  the  bit  level.  For  example,  when  both  operands  are  NaN,  the  expected  output  is  NaN 
(i.e.  the  exponent  is  all  Is  and  the  mantissa  is  not  equal  to  zero).  The  specification  can  be  expressed  as  the 
"AND"  of  the  exponent  output  bits  is  1  and  the  "OR"  of  the  mantissa  output  bits  is  1 . 

When  both  operands  have  normal  or  denormal  values,  the  ideal  specification  is  OUT  =  Round(X  +  y). 
However,  FP  addition  has  exponential  complexity  with  the  word  size  of  the  exponent  part  for  *PHDD.  Thus, 
the  specification  must  be  divided  into  several  sub-specifications  for  verification.  According  to  the  signs  of  both 
operands,  the  function  X  +  Y  can  be  rewritten  as  Equation  3. 


j  ^2^x-b  ^  A  My  X  2-^s'  ®)  Sx  =  Sy  {true  addition) 

1  (2^=o-B  y.  ^  2^y-^)  Sx  /  Sy  {true  subtraction) 
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Similarly,  for  FP  subtraction,  the  function  X  —  Y  can  be  also  rewritten  as  true  addition  when  both  operands 
have  different  signs  and  true  subtraction  when  both  operands  have  the  same  sign. 

4.1  IVue  Addition 


(a)  E^-E  ^<m 


(b)  E^-E^>=m 


Figure  4:  Cases  of  true  addition  for  the  mantissa  part. 


The  *PHDDs  for  the  true  addition  and  subtraction  still  grow  exponentially.  Based  on  the  sizes  of  the  two 
exponents,  the  function  X  +  Y  for  true  addition  can  be  rewritten  as  Equation  4. 


X  +  Y 


/'_1'v5a;  f  ^  X  (Mx  +  (My  >>  (Ex  Ey)))  Ey  <  Ex 

\  )  I  X  (My  +  (Mx  »  (Ey  -  Ex)))  Ey  >  Ex 


(4) 


When  Ey  <  Ejj,  the  exponent  is  Ex  and  the  mantissa  is  the  sum  of  Mx  and  My  right  shifted  by  ( Ex  —  Ey)  bits 

(i.e.  My  »  (Ex  -  Ey)  in  the  equation).  Ex  —  Ey  can  range  from  0  to  2”  -  2,  but  the  number  of  mantissa  bits 

in  FP  format  is  only  m  bits.  Figure  4  illustrates  the  possible  cases  of  tme  addition  for  Ey  <  Ex  based  on  the 
values  of  Ex  -  Ey.  In  Figure  4.a,  for  0  <  Ex  -  Ey  <  m,  the  intermediate  (precise)  result  contains  more  than 
m  bits.  The  right  portion  of  the  result  contains  L,  G,  R  and  S  bits,  where  L  is  the  least  signification  bit  of  the 
mantissa.  The  rounding  mode  will  use  these  bits  to  perform  the  rounding  and  generate  the  final  result(Mo„<) 
in  m-bit  format.  When  Ex  —  Ey  >  m  as  shown  in  Figure  4.b,  the  right  shifted  My  only  contributes  to  the 
intermediate  result  in  the  G,  R  and  S  bits.  Depending  the  rounding  mode,  the  output  mantissa  will  be  Mx  or 
Mx  + 1  *  .  Therefore,  we  only  need  one  specification  in  each  rounding  mode  for  the  cases  Ex  —  Ey  >  m. 

A  similar  analysis  can  be  applied  to  the  case  Ey  >  Ex-  Thus,  the  specifications  for  true  addition  with  rounding 
can  be  written  as: 

Cai{i]  =>  OUT  =  Round((-l)^^  X  x  (Mx  +  (My  >>  i)))  0  <  i  <  m 

Ca2  OUT  =  Round((—\)^^  x  x  (Mx  +  (My  »  m)))  i  >  m 

Ca2[i]  OUT  =  Round((—\)^=^  x  x  (My  +  (Mx  »  i)))  0  <  i  <  m  ' 

Ca4  OUT  —  Round((—l)^^  x  x  (My  +  (Mx  »  nr)))  i  >  m 

where  Ca\[i],  Cai,  Ca^[i]  and  Ca4  are  the  conditions  Cond^add&Ex  =  Ey  +  i,  Cond^add&  Ex  >  Ey  +  m, 
Cond-add&Ey  =  Ex  +  i,  and  Cond.add&Ey  >  Ex  +  m,  respectively.  Cond.add  represents  the  condition 
for  tme  addition  and  exponent  range  (i.e.  normal  and  denormal  numbers  only).  OUT  is  composed  from  the 
outputs  Souu  Eout  and  Mout-  Conditions  Ex  -  Ey  =  i  and  Ex  -  Ey  >  m  are  represented  by  2^^  =  2^*'+*  and 
2Ex  >  2Ey+m_  ggjjj  gf  variables  must  use  Shannon  decomposition  to  represent  the  FP  function  efficiently 
in  [6].  With  this  decomposition,  the  graph  sizes  of  Ex  and  Ey  are  exponential  in  *PHDDs,  but  2^^  and  2^y 
will  have  linear  size.  While  building  BDDs  and  *PHDDs  for  OUT  from  the  circuit,  the  function  on  left  side 
of  ^  will  be  used  to  simplify  the  BDDs  automatically  by  conditional  forward  simulation. 
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The  number  of  specifications  for  true  addition  is  2m  +  1 .  For  instance,  the  value  of  m  for  IEEE  double 
precision  is  53,  thus  the  number  of  specifications  for  true  addition  is  107.  Since  the  specifications  are  very 
similar  to  one  another,  they  can  be  generated  by  a  looping  constmct  in  word-level  SMV. 


4.2  TVue  Subtraction 

The  specification  for  true  subtraction  can  be  divided  into  two  cases:  far  {\Ex-Ey\  >  1)  and  close  {E^  -  Ey=Q,\ 
or  -1).  For  the  far  case,  the  result  of  mantissa  subtraction  does  not  require  a  massive  left  shift  (i.e.,  LZA  is  not 
active).  Similar  to  the  true  addition,  the  specifications  for  true  subtraction  can  be  written  as  Equation  6. 

'  Cs\[i]  OUT  =  Round{{-\f^  x  2^--^  x  (A4  -  {My  »  0))  2<i<m 
Cs2  ^  OUT  =  Round{{-lf-  x  2^-"®  x  (M^  -  {My  »  m)))  i  >  m 
'  Cs^[^]  OUT  =  Round{{-\fy  x  2^y-^  x  {My  -  {M^  »  i)))  2<i<m  ''  '' 

C’,4  OUT  =  Round{{-l)^y  x  2^y-^  x  {My  -  {M^  »  m)))  i  >  m 


where  Csi  [?’] ,  Csi,  Cs3 [*]  and  Cs4  are  CondsubScEx  =  Ey-\-i,C ondsub  ScE^  >  Ey  +  m,C ondsub&Ey  = 

Ex  +  i,  and  Condsub&Ey  >  Ex +  ni,  respectively.  Condsub  represents  the  condition  for  true  subtraction. 

For  the  close  case,  the  difference  of  the  two  mantissas  may  generate  some  leading  zeroes  such  that  normal¬ 
ization  is  required  to  product  a  result  in  IEEE  format.  For  example,  when  Ex  —  Ey  =  0,  Mx  —  My=0.0...0l 
must  be  left  shifted  by  m  -  1  bits  to  1.0...00.  The  number  bits  to  left  shift  is  computed  in  the  LZA  circuit  and 
fed  into  the  left  shifter  to  perform  normalization  and  into  the  subtractor  to  adjust  the  exponent.  The  number  of 
bits  to  be  left  shifted  ranges  from  0  to  m  and  is  a  function  of  Mx  and  My.  The  combination  of  left  shifting  and 
mantissa  subtraction  make  the  *PHDDs  become  irregular  and  grow  exponentially.  Therefore,  the  specifications 
for  these  cases  must  be  divided  further  to  take  care  of  the  exponential  growth  of  *PHDD  sizes. 

Based  on  the  number  of  leading  zeroes  in  the  intermediate  result  of  mantissa  subtraction,  we  divide  the 
specifications  for  the  true  subtraction  close  case  as  shown  in  Equation  7. 

'  Cc\[z]  =»  OUT  =  Round{{-lf^  x  x  {Mx  -  {My  »  1)))  0  <  z  <  m 

Ccil'i  =>  OUT  =  Round{{-l  fy  X  2^y-^  X  {My  ~  {Mx  »  1)))  0<i<m 
'  Ccsli)  ^  OUT  =  Round{{-lf- X  2^^-^  X  {Mx  -  My))  l<i<m 

Cc4H)  OUT  =  Round{{-l)^y  x  2^y-^  x  {My  -  Mx))  l<i<m 

where  where  Cci[i],  Cc2[i]>Cc3[i],  and  C'c4[*]  weCondsub  &  Ex  =  Ey  -|- 1  &  LS[i],Cond-sub  &Ey  =  Ex  -|- 
1  &  LS[i],  Cond.sub&Ex  =  Ey&Mx  >  My  &  LS[i],  and  Cond.sub&  Ey  =  Ex  +  l&Mx  <  My  Sc  LS[i], 
respectively.  LSl[i],  LS2[i],  LS3[i]  and  LS4[i]  represent  the  conditions  that  the  intermediate  result  has  i  leading 
zeroes  to  be  left  shifted.  LSl[{\,  LS2[{],  i53[i]  and  LS4[i]  are  computed  by  <  Mx  -  {My  »  1)  < 

253-i^2^-r-i  <  My-{Mx  »  1)  <  2’”-*,2™-‘-i  <  Mx  -  My  <  2”^-%  and  2™-*- ^  <  My -Mx<  2“-'), 
respectively.  A  special  case  is  that  the  output  is  zero  when  Ex  is  equal  to  Ey  and  Mx  is  equal  to  My.  The 
specification  is  as  follows:  {Cond^ub  ScEx  =  Ey  ScMx  =  My)  OUT  =  0. 

4.3  Specification  Coverage 

Since  the  specifications  of  floating-point  adders  are  split  into  several  hundreds  of  sub-specifications,  do  these 
sub-specifications  cover  the  entire  input  space?  To  answer  this  questions,  someone  would  suggest  to  use 
theorem  provers  to  handle  the  case  splitting.  In  contrast,  we  propose  a  BDD  approach  to  compute  the  coverage 
of  our  specifications. 

Our  approach  is  based  on  this  observation  that  our  specifications  are  in  the  form  "cond  out  = 
expectedjresult"  and  cond  is  only  dependent  on  the  inputs  of  the  circuits.  Thus,  the  union  of  the  cond& 
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of  our  specifications,  which  can  be  done  by  BDD  operations,  must  be  TRUE  when  our  specifications  cover  the 
entire  input  space.  In  other  words,  the  union  of  the  conds  can  be  used  to  compute  the  percentage  of  input  space 
covered  by  our  specifications  and  to  generate  the  cases  which  are  not  covered  by  our  specifications. 

5  Verification  System:  Extended  Word-Level  SMV  with  *PHDDs 

Model  checking  is  a  technique  to  determine  which  states  satisfy  a  given  temporal  logic  formula  for  a  given 
state-transition  graph.  In  SMV  [19],  BDDs  are  used  to  represent  the  transition  relations  and  set  of  states.  The 
model  checking  process  is  performed  iteratively  on  these  BDDs.  SMV  has  been  widely  used  to  verify  control 
circuits  in  industry,  but  for  arithmetic  circuits,  particularly  for  ones  containing  multipliers,  the  BDDs  grows  too 
large  to  be  tractable.  Furthermore,  expressing  desired  behavior  with  Boolean  formulas  are  not  appropriated. 

To  verify  arithmetic  circuits,  word-level  SMV  [9]  with  HDDs  extended  SMV  to  handle  word  level  expressions 
in  the  specification  formulas.  In  word-level  SMV,  the  transition  relation  as  well  as  those  formulas  that  do  not 
involve  words  are  represented  using  BDDs.  HDDs  are  used  only  to  compute  word-level  expressions  such  as 
addition  and  multiplication.  When  a  relational  operation  is  performed  on  two  HDDs,  a  BDD  is  used  to  represent 
the  set  of  assignments  that  satisfies  the  relation.  The  BDDs  for  temporal  formulas  are  computed  in  the  same 
way  as  in  SMV.  For  example,  the  evaluation  the  formula  AG{R  =  A  +  B),  where  R,  A  and  B  are  word-level 
functions  and  AG  is  a  temporal  operator,  is  performed  by  first  computing  the  HDDs  for  R,  A,  B  and  A  +  B, 
then  generating  BDDs  for  the  relation  R  =  A  +  B,  and  finally  applying  the  AG  operator  to  these  BDDs.  The 
reader  can  refer  to  [9]  for  the  details  of  word-level  SMV. 


’ . ’  *0 


Figure  5:  Horizontal  division  of  a  combinational  circuit, 

We  have  integrated  *PHDDs  into  word-level  SMV  and  introduced  relational  operators  for  floating-point 
numbers.  As  in  word-level  SMV,  only  the  word-level  functions  are  represented  by  *PHDDs  and  the  rest  of  the 
functions  are  represented  by  BDDs. 

Zhao’s  thesis  [24]  describes  the  layering  backward  substitution,  a  variant  of  Hamaguchi’s  backward  sub¬ 
stitution  approach  [15],  although  the  public  released  version  of  word-level  SMV  does  not  implement  this 
feature.  We  have  implemented  this  feature  in  our  system.  The  main  idea  of  layering  backward  substitution 
is  to  virtually  cut  the  circuit  horizontally  by  introducing  auxiliary  variables  to  avoid  the  explosion  of  BDDs 
while  symbolic  evaluating  bit  level  circuits.  Figure  5  shows  a  horizontal  division  of  a  combinational  circuit 
with  primary  inputs  xo,...,Xm  and  outputs  yo,---,  Vn-  For  0  <  i  <  n,  pi  =  fi{xQ, . . . , Xm)  where  fi  is  a 
Boolean  function,  but  it  may  not  be  feasible  to  be  represented  as  a  BDD.  The  circuit  is  divided  into  several 
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layers  by  declaring  some  of  the  internal  nodes  as  auxiliary  variables.  In  this  example,  ?/,  =  fu{zo, . .  .,Zk)', 
Zi  =  fiiiwQ, . . . ,  wi);  and  Wi  =  hi{xo, . . . ,  Xm)-  Since  each  fji  is  simpler  than  fi,  the  BDD  sizes  to  represent 
them  are  generally  much  smaller.  When  we  try  to  compute  *PHDD  representation  of  the  word  (yo,  •  •  •  >  l/n)  in 
terms  of  the  variables  a;o, . . . ,  we  first  compute  the  *PHDD  representation  of  the  word  in  terms  of  variables 
zq,  . .  .,Zkas  F  =  J2iLo  2*  X  fuizo,  ■..,Zk).  Then  we  replace  each  Zi,  one  at  a  time,  by  f2i{wo,  After 

this,  we  have  obtained  the  *PHDD  representation  for  the  word  in  terms  of  variables  wo,...,wi.  Likewise,  we 
can  replace  each  Wi  by  /si  (ccq,  -  • . ,  a^n)-  In  this  way,  the  *PHDD  representation  of  the  word  in  terms  of  primary 
input  can  be  computed  without  building  BDDs  for  each  output  bit. 

The  main  drawback  of  the  backward  substitution  is  that  users  still  need  to  provide  the  information  about  the 
auxiliary  variables  (i.e.,  the  virtual  cuts).  Another  drawback  is  that  the  *PHDDs  may  grow  exponentially  during 
the  substitution  process,  since  the  auxiliary  variables  may  generalize  the  circuit  behavior  for  some  regions.  For 
example,  suppose  that  the  internal  nodes  Zk  and  zk-i  under  the  original  circuit  have  the  relation  that  both  of 
them  can  not  be  0  at  the  same  time  and  that  the  circuit  of  region  1  can  only  handle  this  case.  After  introducing 
the  auxiliary  variables,  variables  Zk  and  Zk-i  can  be  0  simultaneously.  Hence,  the  word-level  function  F 
represents  the  function  more  general  than  the  original  circuit  of  region  1.  This  generalization  may  cause  the 
*PHDD  for  F  to  blowup. 

5.1  Conditional  Symbolic  Simulation 

To  overcome  these  drawbacks,  we  introduced  a  conditional  symbolic  simulation  technique  into  word-level  SMV. 
Symbolic  simulation  [3]  performs  the  simulation  with  inputs  having  symbolic  values  (i.e..  Boolean  variables  or 
Boolean  functions).  The  simulation  process  builds  BDDs  for  the  circuits.  If  each  input  is  a  Boolean  variable, 
this  approach  may  cause  the  explosion  of  BDD  sizes  in  the  middle  of  the  process,  because  it  tries  to  simulate 
the  entire  circuit  for  all  possible  inputs  at  once.  The  concept  of  conditional  symbolic  simulation  is  to  perform 
the  simulation  process  under  a  restricted  condition,  expressed  as  a  Boolean  function  over  the  inputs. 

In  [17],  Tain  and  Gopalakrishnan  encoded  the  conditions  together  with  the  original  inputs  as  new  inputs 
to  the  symbolic  simulator  using  a  parametric  form  of  Boolean  expressions,  but  it  is  hard  to  incorporate  this 
approach  into  word-level  SMV.  Our  approach  is  to  apply  the  conditions  directly  during  the  symbolic  simulation 
process.  Right  after  building  the  BDD  for  a  circuit  gate,  the  condition  is  used  to  simplify  the  BDDs  using 
the  restrict  [12]  algorithm.  Then,  the  simplified  BDD  is  used  as  the  input  function  for  the  gates  connected  to 
this  one.  This  process  is  repeated  until  the  outputs  are  reached.  This  approach  can  be  viewed  as  dynamically 
extracting  the  circuit  behavior  under  the  specified  condition  without  modifying  the  actual  circuit. 

5.2  Equalities  and  Inequalities  with  Conditions 

To  verify  arithmetic  circuits,  it  is  very  useful  to  compute  the  set  of  assignments  that  satisfy  F  ~  G,  where  F 
and  G  are  word  level  functions  represented  by  HDDs  or  *PHDDs,  and  ~  can  be  any  one  of =,7^,  <,>,<,>. 
In  general,  the  complexity  of  this  problem  is  exponential.  However,  Clarke,  et  al.  presented  a  branch-bound 
algorithm  to  efficiently  solve  this  problem  for  a  special  class  of  HDDs,  called  linear  expression  functions  using 
the  positive  Davio  decomposition  [8].  The  basic  idea  of  their  algorithm  is  first  to  compute  H  =  F  -  G  and 
then  to  compute  the  set  of  assignments  satisfying  H  0  using  branch-and-bound  approach.  The  complexity  of 
subtracting  two  HDDs  is  0(IFIx  1(71)  This  algorithm  only  works  well  for  the  special  class  of  HDDs  (i.e.,  linear 
expression  functions).  However,  the  complexity  of  this  algorithm  for  other  classes  of  HDDs  or  *PHDDs  can 
grows  exponentially.  In  the  verification  of  arithmetic  circuits,  HDDs  and  *PHDDs  are  not  always  in  the  class 
of  linear  expression  functions.  Thus,  the  ~  0  operations  can  not  be  computed  for  most  cases.  For  example, 
"OUT  =  Round{...)"  in  some  specifications  of  FP  adders  in  Section  4.2  can  not  be  finished  after  several  CPU 
hours. 
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bdd cond_equal-0(<  Wh,h  >,  cond) 

1  if  cond  is  FALSE,  return  FALSE; 

2  if  <  Wh,h  >  is  2i  terminal  node,  return  (<  Wh ,  h  >)=  0  ?  TRUE  :  FALSE; 

3  if  the  operation  (cond_equal_0,<  Wh ,  h  >,cond)  is  in  computed  cache, 

return  result  found  in  cache; 

4  r  4-  top  variable  of  h  and  cond\ 

5  <  WfiQ ,ho>,<  ,hi  0“  and  1-branch  of  <  Wh^h  >  with  respect  to  variable  r  ; 

6  condo,  cond\  ^  0-  and  1 -branch  of  cond  with  respect  to  variable  r  ; 

7  bound_value(<  WhQ,  ho  >,  upper lower bound_value(<  Whi,h\  >,  upper lower 

8  if  (r  uses  the  Shannon  decomposition)  { 

9  if  (upper  hQ  <  0\\lowerhQ  >  0)  reso  4-  FALSE; 

10  else  reso  ^  cond_equal-0(<  WhQ,  ho  >,condoY 

1 1  resi  is  computed  similar  to  reso; 

12  }  else  if  (r  uses  the  positive  Davio  decomposition)  { 

13  reso  is  computed  the  same  as  reso  in  Shannon  decomposition; 

14  upper  hi  ^  upperhi  +  upper  hQ  \  lower  h^  ^  lower  h^  •¥  lower  h^', 

15  if  (upper  hi  <  0\\lowerhi  >  0)  resi  -4-  FALSE; 

16  else  if  (cond\  is  FALSE)  resi  4~  FALSE; 

17  else  { 

18  <Whi,ho>^  addition(<  Whi,hi  >,  <  Wh^,  ho  >); 

19  resi  4-  cond_equal_0(<  Whi ,  >,  cond\)\ 

20  } 

21  }  else  if  (r  uses  the  negative  Davio  decomposition)  { 

reso  and  resi  computation  are  similar  to  them  in  positive  Davio  decomposition. 

22  } 

23  result  ^  find  BDD  node  (r,  reso,  resi)  in  unique  table,  or  create  one  if  not  exists; 

24  insert  (cond_equal_0,  <  Wh,h  >,  cond^  result)  into  the  computed  cache; 

25  return  result. 


Figure  6:  algorithm  for  H  -0  with  conditions.  H  — <  Wh^  h  >. 
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To  solve  this  problem,  we  introduce  the  relational  operations  with  conditions,  since  the  equality  and  inequality 
in  our  specifications  must  be  hold  only  under  the  conditions.  These  operations  take  three  arguments  F,  G  and 
cond,  where  F  and  G  are  word  level  functions  and  cond  is  a  Boolean  function.  First,  it  computes  H  =  F  —  G 
and  then  computes  the  set  of  assignments  satisfying  ~  0  under  the  condition  cond.  For  example,  the 
algorithm  for  i/  =  0  under  the  condition  cond  is  given  in  Figure  6.  This  algorithm  produces  the  BDDs 
satisfying  H  —  0  under  the  condition  cond,  and  is  similar  to  the  algorithm  in  [8],  except  that  it  takes  an  extra 
BDD  argument  for  the  condition  and  uses  the  condition  to  stop  the  equality  checking  of  the  algorithm  as  soon 
as  possible.  To  be  conservative,  when  the  condition  is  false,  the  returned  result  is  false.  In  line  1,  the  condition 
is  used  to  stop  this  algorithm,  when  the  condition  is  false.  In  line  16,  the  condition  is  also  used  to  stop  the 
addition  of  two  *PHDDs  and  the  further  equality  checking  in  lines  18  and  19,  respectively.  The  efficiency  of 
this  algorithm  will  depend  on  the  BDDs  for  the  condition.  If  the  condition  is  always  true,  then  this  algorithm  has 
the  same  behavior  as  Clarke’s  algorithm.  If  the  condition  is  always  false,  then  this  algorithm  will  immediately 
return  false  regardless  of  how  complex  the  *PHDD  is.  This  new  algorithm  has  reduced  the  computation  time 
dramatically  for  the  specifications  in  Section  4.2. 

5.3  Equalities  and  Inequalities 

The  efficiency  of  Clarke’s  algorithm  for  relational  operations  of  two  HDDs  depends  on  the  complexity  of 
computing  H  —  F  —  G.  The  complexity  of  subtracting  two  HDDs  is  0(IFIx  IGI),  and  similar  algorithms  can  be 
used  for  these  relational  operators  with  *PHDDs.  However,  the  complexity  of  subtracting  two  *PHDDs  using 
disjunctive  sets  of  supporting  variables  may  grow  exponentially.  For  example,  the  complexity  of  subtraction  of 
two  FP  encodings  represented  by  *PHDDs  grows  exponentially  with  the  word  size  of  exponent  part  [6].  Thus, 
Clarke’s  algorithm  is  not  suitable  for  these  operators  with  two  *PHDDs  having  disjunctive  sets  of  supporting 
variables.  These  operators  are  commonly  used  in  our  specifications  of  FP  adders  in  Section  4.2  and  4.1. 

We  have  developed  algorithms  for  these  relational  operators  with  two  *PHDDs  having  disjunctive  sets  of 
supporting  variables.  Figure  7  shows  the  new  algorithm  for  computing  BDDs  for  the  set  of  assignments  that 
satisfy  F  >  G.  Similar  algorithms  are  used  for  other  relational  operators.  The  main  concept  of  this  algorithm  is 
to  directly  apply  the  branch-and-bound  approach  without  performing  a  subtraction,  whose  complexity  could  be 
exponential.  First,  if  both  arguments  are  constant,  the  algorithm  returns  the  comparison  result  of  the  arguments . 
In  line  2  and  3 ,  weights  w  /  and  Wg  are  adjusted  by  the  minimum  of  them  to  increase  the  sharing  of  the  operations, 
since  {2^f  x  /)  >  (2^^  x  g)  is  the  same  as  (2"'/-""®"  x  /)  >  x  g),  where  min  is  the  minimum 

of  wj  and  Wg.  Line  4  checks  whether  the  comparison  is  in  the  computed  cache  and  returns  the  result  if  it  is 
found.  In  line  5  to  7,  the  top  variable  r  is  chosen  and  the  0-  and  1 -branches  of  /  and  g  are  computed.  In  lines  8 
and  9,  function  bound-value  is  used  to  compute  the  upper  and  lower  bounds  of  these  four  sub-functions.  The 
algorithm  of  boundjoalue  is  similar  to  that  described  in  [8],  except  edge  weights  are  handled.  The  complexity 
of  bound-value  is  linear  in  the  graph  size.  When  r  uses  the  Shannon  decomposition,  lines  11  and  12  try  to 
bound  and  finish  the  search  for  the  0-branch.  If  it  is  not  successful,  line  13  recursively  calls  this  algorithm 
for  0-branch.  The  1 -branch  is  handled  in  a  similar  way.  When  r  uses  the  positive  Davio  decomposition,  the 
computation  for  0-branch  is  the  same  as  that  in  Shannon  decomposition,  since  <  w/, ,  /i  >  is  the  linear  moment 
of  <  w/ ,  /  >  and  the  1-cofactor  of  <  toy ,  /  >  is  equal  to  <  to/j ,  /i  >  +  <  w/o,  /o  > ,  the  lower(upper)  bound 
of  the  1 -cofactor  of  <  toy,/  >  is  bounded  by  the  sum  of  lower  (upper)  bounds  of  <  >and<  toyo,/o  >. 

For  the  1 -branch,  new  upper  and  lower  bounds  for  the  1 -cofactors  are  recomputed  in  lines  17  and  18.  In  lines 
19  and  20,  new  upper  and  lower  bounds  are  used  to  bound  and  stop  the  further  checking  for  1 -cofactor.  If  it  is 
not  successful,  lines  21-24  add  the  constant  and  linear  moments  to  get  the  1 -cofactors  and  recursively  call  this 
algorithm  for  the  1 -cofactor  case.  For  the  negative  Davio  decomposition,  the  0-  and  1 -branches  are  handled 
similar  to  the  positive  Davio  decomposition.  After  generating  resQ  and  resi  for  0-  and  1-cofactors,  the  result 
BDD  is  built  and  this  computed  operation  is  inserted  to  the  computed  cache  for  future  lookups. 
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bdd  greater_than(<  wj,  f  >,  <  Wg,g  >) 

1  if  both  <  Wf,f>  and  <  Wg,g  >  are  terminal  nodes, 

return  ((<  wj.f  >)  >  (<  Wg,g  >))  ?  TRUE  :  FALSE; 

2  min  <r-  minimum(it;/ ,  Wg)', 

3  Wf  ^  Wf— rmn\Wg  Wg— imn\ 

4  if  the  operation  (greaterJhan,  <  wj,f>,<  Wg,g  >)isin  computed  cache, 

return  result  found  in  cache; 

5  r  top  variable  of  /  and  g 

6  <  wjQ^fo  >,  <  Wf^ ,  fi  0-  and  l-branch  of  <  Wf,f>  with  respect  to  variable  r  ; 

7  <  't^go,go>,<  Wg^,  gi  0“  and  l-branch  of  <  ^  >  with  respect  to  variable  r  ; 

8  bound_value(<  fo  >,  upper j^Jowerj^y,  bound_value(<  Wg^  ,go>,  upper g^Jowevg^); 

9  bound_value(<  ,  /i  >,  upper ,  lower f^y  bound_value(<  Wg^  ,gi>,  upper ,  lower g^)\ 

10  if  (r  uses  the  Shannon  decomposition)  { 

11  if  (upper Jq  <  lower g^)  reso  FALSE; 

12  else  if  (lower >  upper g^)  reso  ^  TRUE; 

13  else  reso  ^  greater_than(<  ,  /o  >,<  Wg^ ,  go  >); 

14  resi  is  computed  similar  to  reso; 

15  }  else  if  (r  uses  the  positive  Davio  decomposition)! 

16  reso  is  computed  the  same  as  reso  in  Shannon  decomposition; 

17  upper ^  upper +  upper upperg^  upperg^  +  upperg^\ 

18  lower lower +  lower lower g^  ^  lower g^  +  lower g^', 

19  if  (upper <  lower g^)  res\  ^  FALSE; 

20  else  if  (/oy;er^j  >  upper g^)  resi  ^  TRUE; 

21  else  { 

22  <wj,Ji>  ^  addition(<  Wf^Ji>,<  Wf^ ,  fo  >); 

<Wg,,gi><r-  addition(<  Wg,,gi  >,  <  >); 

23  resi  great er_than(<  Wf,,fi  >,<  Wg,,gi  >); 

24  } 

25  }  else  if  (r  uses  the  negative  Davio  decomposition)! 

26  reso  and  resi  are  computed  similar  to  positive  Davio  decomposition. 

27  } 

28  result  ^  find  BDD  node  (r,  reso,  resi)  in  unique  table,  or  create  one  if  not  exists. 

29  insert  (greater.than,  <  wj,f>,<  Wg,g  >, result)  into  the  computed  cache 

30  return  result. 


Figure7:  Improved  algorithm  for  F  >  G.  F  =<  Wf,f  >  andG  =<  Wg,g  >. 


13 


Figure  8:  *PHDDs  for  F  and  G. 


This  algorithm  works  very  well  for  two  *PHDDs  with  disjunctive  set  of  supporting  variables,  while  Clarke’s 
algorithm  has  exponential  complexity.  For  example,  let  F  —  nr=o  ^  ~  OiLo  variable 

ordering  is  Xn,  y-a,  xq,  yo  and  all  variables  use  the  Shannon  decomposition.  The  *PHDDs  for  F  and  G  have 
the  structure  shown  in  Figure  8.  It  can  be  proven  that  the  complexity  of  this  algorithm  for  this  type  of  function 
is  0(n)  if  the  computed  cache  is  a  complete  cache. 

5.4  Short-Circuiting  Technique 

Can  we  verify  the  specifications  of  FP  adders  by  conditional  forward  simulation?  In  our  experience,  all 
specifications  for  the  FP  adder  design  without  a  mantissa  comparator,  as  in  Figure  2.a,  can  be  verified  by 
conditional  forward  simulation,  but  not  so  for  the  FP  adder  containing  a  mantissa  comparator,  as  in  Figure  2.b. 
This  is  caused  by  a  conflict  of  variable  orderings  for  the  mantissa  adder  and  the  mantissa  comparator,  which 
generates  the  signal  Mx  <  My  (i.e.  signal  d  in  Figure  3).  The  best  variable  ordering  for  the  comparator  is  to 
interleave  the  two  vectors  from  the  most  significant  bit  to  the  least  significant  bit  (i.e.,  Xm-i,  ym-\,  yo)- 
Table  2  shows  the  CPU  time  in  seconds  and  the  BDD  size  of  the  signal  d  under  different  variable  orderings, 
where  ordering  offset  represents  the  number  of  bit  offset  from  the  best  ordering.  For  example,  the  ordering  is 
Xm-u  ...,  Xm-6,  ym-u  Xm-7,  ym-2,  ■;  ^0,  ys-  yo,  when  the  Ordering  offset  is  5.  Clearly,  the  BDD  size 
grows  exponentially  with  the  offset.  In  contrast  to  the  comparator,  the  best  ordering  for  the  mantissa  adder  is 

Xm-I, Xm-k-u  ym-i,  Xr,^-k-2,  ym-2,  xq,  yk . yo,  when  the  exponent  difference  is  k.  We  observed 

that  the  best  ordering  for  the  specification  represented  by  *PHDDs  is  the  same  ordering  as  the  best  ordering 
for  the  mantissa  adder.  Thus,  the  extended  word-level  SMV  can  not  build  the  BDDs  for  both  the  mantissa 
comparator  and  mantissa  adder  by  conditional  forward  simulation,  when  the  exponent  difference  is  large. 

Let  us  examine  the  compare  unit  carefully.  We  find  that  the  signal  d  is  used  only  when  =  Ey.  In  other 
words,  it  is  not  necessary  to  build  the  BDDs  for  it,  when  \Ex  —  Ey\  is  greater  than  0.  Based  on  this  fact, 
we  introduce  a  short-circuiting  technique  to  eliminate  unnecessary  computations  as  early  as  possible.  The 
word-level  SMV  and  *PHDD  packages  are  modified  to  incorporate  this  technique.  In  the  *PHDD  package, 
the  BDD  operators,  such  as  And  and  Or,  are  modified  to  abort  the  operation  and  return  a  special  token  when 
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Ordering  Offset 

BDD  Size 

CPU  Time  (Sec.) 

0 

157 

0.68 

1 

309 

0.88 

2 

608 

1.35 

3 

1195 

2.11 

4 

2346 

3.79 

5 

4601 

7.16 

6 

9016 

13.05 

7 

17655 

26.69 

8 

34550 

61.61 

9 

67573 

135.22 

10 

132084 

276.23 

Table  2:  Performance  measurements  of  a  52-bit  comparator  with  different  orderings. 


the  number  of  newly  created  BDD  nodes  within  this  BDD  call  is  greater  than  a  size  threshold.  In  word-level 
SMV,  for  an  And  gate  with  two  inputs,  if  the  first  input  evaluates  0,  0  will  be  returned  without  building  the 
BDDs  for  the  second  input.  Otherwise,  the  second  input  will  be  evaluated.  If  the  second  input  evaluates  to 
0  and  the  first  input  evaluates  to  a  special  token,  0  is  returned.  Similar  technique  is  applied  to  Or  gates  with 
two  inputs.  Nand{Nor)  gates  can  be  decomposed  into  Not  and  And  {Or)  gates  and  use  the  same  technique 
to  terminate  earlier.  For  other  logic  gates  with  two  inputs,  the  result  is  a  special  token,  if  any  of  the  inputs 
evaluates  to  a  special  token.  If  the  special  token  is  propagated  to  the  output  of  the  circuit,  then  the  size  threshold 
is  doubled  and  the  output  is  recomputed.  This  process  is  repeated  until  the  output  BDD  is  built.  For  example, 
when  the  exponent  difference  is  30,  the  size  threshold  is  10000,  the  ordering  is  the  best  ordering  of  mantissa 
adder,  and  the  evaluation  sequence  of  the  compare  unit  shown  in  Figure  3  is  d,  e,  f,  g  and  h,  the  values  of 
signals  d,  e,  f,  g  and  h  will  be  special  token,  0,  0,  1,  and  1,  respectively,  by  conditional  forward  simulation. 
With  these  modification,  the  new  system  can  verify  all  of  the  specifications  for  both  types  of  FP  adders  by 
conditional  forward  simulation.  We  believe  that  this  short-circuiting  technique  can  be  generalized  and  used  in 
the  verification  which  only  exercises  part  of  the  circuits. 


6  Verification  of  FP  Adders 

In  this  section,  we  used  the  FP  adder  in  the  Aurora  HI  Chip  [16],  designed  by  Dr.  Huff  as  part  of  his  PhD 
dissertation  at  the  University  of  Michigan,  as  an  example  to  illustrate  the  verification  of  FP  adders.  This  adder 
is  based  on  the  same  approach  as  the  SNAP  FP  adder  [20]  at  Stanford  University.  Dr.  Huff  found  several 
errors  with  the  approach  described  in  [20].  This  FP  adder  only  handles  operands  with  normal  values.  When 
the  result  is  a  denormal  value,  it  is  truncated  to  0.  This  adder  supports  IEEE  double  precision  format  and  the 
4  IEEE  rounding  modes.  In  this  verification  work,  we  verify  the  adder  only  in  round  to  nearest  mode,  because 
we  believe  that  the  round  to  nearest  mode  is  the  hardest  one  to  verify.  All  experiments  were  carried  out  on  a 
Sun  248  MHz  UltraSPARC-II  server  with  1.5  GB  memory. 

The  FP  adder  is  described  in  the  Verilog  language  in  a  hierarchical  manner.  The  circuit  was  synthesized  into 
flattened,  gate-level  Verilog,  which  contains  latches,  multiplexors,  and  logic  gates,  by  Dr.  John  Zhong  at  SGI. 
Then,  a  simple  Perl  script  was  used  to  translate  the  circuit  from  gate-level  Verilog  to  SMV  format. 
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6.1  Latch  Removal 


Huff’s  FP  adder  is  a  pipelined,  two  phase  design  with  a  latency  of  three  clock  cycles.  We  handled  the  latches 
during  the  translation  from  gate-level  Verilog  to  SMV  format.  Figure  9.a  shows  the  latches  in  the  pipelined, 
two  phase  design.  In  the  design,  phase  2  clock  is  the  complement  of  the  phase  1  clock.  Since  we  only  verify 
the  functional  correctness  of  the  design  and  the  FP  adder  does  not  have  any  feedback  loops,  the  latches  can 
be  replaced  by  And  gates,  as  shown  in  Figure  9.b,  without  losing  the  functional  behavior  of  the  circuit.  Since 
phase  2  clock  is  the  complement  of  the  phase  1  clock,  we  must  replace  the  phase  2  clock  by  the  phase  1  clock. 
Otherwise  the  circuit  behavior  will  be  incorrect. 


Phase  1 
clock 


Phase  2 
clock 


(a)  (b) 

Figure  9;  Latch  Removal,  (a)  The  pipelined,  two  phase  design,  (b)  The  design  after  latch  removal. 


6.2  Design  with  Bugs 

In  this  section,  we  describe  our  experience  with  the  verification  of  a  FP  adder  with  design  errors.  During  the 
verification  process,  our  system  found  several  design  errors  in  Huff’s  FP  adder.  These  errors  were  not  caught 
by  random  simulation  performed  by  Dr.  Huff. 

The  first  error  we  found  is  the  case  when  A  +  C  =  01.1 1 1...1 1,  A  -I-  C  -f-  1=10.000...00,  and  the  rounding 
logic  decides  to  add  1  to  the  least  significant  bit  (i.e.,  the  result  should  be  A  -f-  C  -f  1),  but  the  circuit  design 
outputs  A+C  as  the  result.  This  error  is  caused  by  the  incorrect  logic  in  the  path  select  unit,  which  categorized 
this  case  as  a  no  shift  case  instead  of  a  right  shift  by  1.  While  we  were  verifying  the  specification  of  true 
addition,  our  system  generated  a  counterexample  for  this  case  in  around  50  seconds.  To  ensure  that  this  bug  is 
not  introduced  by  the  translation,  we  have  used  Cadence’s  Verilog  simulation  to  verify  this  bug  in  the  original 
design  by  simulating  the  input  pattern  generated  from  our  system.  Another  design  error  we  found  is  in  the  sticky 


Phase  1 
clock 


Phase  1 
clock 


L 

Ny 

i 

0 

s 

Figure  10;  Sticky  bit  generation,  when  Ex  -  Ey=  54. 

bit  generation.  The  sticky  bit  generation  is  based  on  the  table  given  in  page  10  of  Quach’s  paper  describing 
the  SNAP  FP  adder  [20].  The  table  only  handles  cases  when  the  absolute  value  of  the  exponent  difference  is 
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less  than  54.  The  sticky  bit  is  set  1  when  the  absolute  value  of  the  exponent  difference  is  greater  than  53  (for 
normal  numbers  only).  The  bug  is  that  the  sticky  bit  is  not  always  1  when  the  absolute  value  of  the  exponent 
difference  is  equal  to  54.  Figure  10  shows  the  sticky  bit  generation  when  Ex  -  Ey  =  54.  Since  Nx  has  52 
bits,  the  leading  1  will  be  the  Round  (R)  bit  and  the  sticky  {S)  bit  is  the  OR  of  all  of  Ny  bits,  which  may  be  0. 
Therefore  an  entry  for  the  case  \Ex  -  =  54  is  needed  in  the  table  of  Quach’s  paper  [20]. 

From  our  experience,  the  design  in  the  mantissa  path  doesn’t  cause  the  *PHDD  explosion  problem.  However, 
when  the  error  is  in  the  exponent  path,  the  *PHDD  may  grow  exponentially  while  building  the  output.  A  useful 
tip  to  overcome  the  *PHDD  explosion  problem  is  to  reduce  the  exponent  value  to  a  smaller  range  by  changing 
the  exponent  range  condition  in  Condjadd  or  Condsub  in  Equation  5,  6  or  7. 

6.3  Corrected  Designs 

After  identifying  the  bugs,  we  fixed  the  circuit  in  the  SMV  format.  In  addition,  we  created  another  FP  adder 
by  adding  the  compare  unit  in  Figure  2.b  into  Huff’s  FP  adder.  This  new  adder  is  equivalent  to  the  IT  adder  in 
Figure  2.b,  since  the  ones  complement  unit  will  not  be  active  at  any  time. 

To  verify  the  FP  adders,  we  combined  the  specifications  for  both  addition  and  subtraction  instmctions  into 
the  specification  of  true  addition  and  subtraction.  We  use  the  same  specifications  to  verify  both  FP  adders. 
Table  3  shows  the  CPU  time  in  seconds  and  the  maximum  memory  required  for  the  verification  of  both  FP 
adders.  The  CPU  time  is  the  total  time  for  verifying  all  specifications.  For  example,  the  specifications  of  tme 
addition  are  partitioned  into  18  groups  and  the  specifications  in  the  same  group  use  the  same  variable  ordering. 
The  CPU  time  is  the  sum  of  these  18  verification  runs.  The  FP  adder  n  can  not  be  verified  by  conditional 
forward  simulation  without  the  short-circuiting  technique.  The  maximum  memory  is  the  maximum  memory 
requirement  of  these  18  mns.  For  both  FP  adders,  the  verification  can  be  done  within  two  hours  and  requires 
less  than  55  MB.  Each  individual  specification  can  be  verified  in  less  than  200  seconds. 


Case 

FP  adder  I 

FP  adder  II 

CPU  Time  (Sec.) 

Max.  Memory(MB) 

CPU  Time  (Sec.) 

Max.  Memory(MB) 

True  addition 

3283 

49 

3329 

55 

True  subtraction(^ar) 

2654 

35 

2668 

35 

True  subtraction  {close) 

994 

53 

1002 

48 

Table  3:  Performance  measurements  of  verification  of  FP  adders.  FP  adder  I  is  Huff’s  FP  adder  with  bugs 
fixed.  FP  adder  n  is  FP  adder  I  with  the  compare  unit  in  Figure  2.b.  For  tme  subtraction, /ar  represent  cases 
\Ex  —  Ey\  >  1,  and  close  represent  cases  \Ex  —  Ey\  <  1. 

In  our  experience,  the  decomposition  type  of  the  subtrahend’s  variables  for  the  trae  subtraction  cases  is 
very  important  to  the  verification  time.  For  the  tme  subtraction  cases,  the  best  decomposition  type  of  the 
subtrahend’s  variables  is  negative  Davio  decomposition.  If  the  subtrahend’s  variables  use  the  positive  Davio 
decomposition,  the  *PHDDs  for  OUT  can  not  be  built  after  a  long  CPU  time  (>  4  hours). 

As  for  the  coverage,  the  verified  specifications  cover  99.78%  of  the  input  space  for  the  floating-point  adders 
in  IEEE  round-to-nearest  mode.  The  uncovered  input  space  (0.22%)  is  caused  by  the  unimplemented  circuits 
for  handling  the  cases  of  any  operands  with  denormal,  NaN  or  oo  values,  and  the  cases  where  the  result  of  the 
tme  subtraction  is  denormal  value. 

Our  results  should  not  be  compared  with  the  results  in  [7],  since  the  FP  adders  handle  difference  precision 
(i.e.,  their  adder  handles  IEEE  extended  double  precision)  and  the  CPU  performance  ratio  of  two  different 
machines  is  unknown  (they  used  a  HP  9000  workstation  with  256MB  memory).  Moreover,  their  approach 
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partitioned  the  circuit  into  sub-circuits  which  are  verified  individually  based  on  the  assumptions  about  their 
inputs,  while  our  approach  is  implementation-independent. 


7  Conversion  Circuits 

The  overflow  flag  erratum  of  the  FIST  instruction  (floating-point  to  integer  conversion)  [14]  in  Intel’s  Pentium 
Pro  and  Pentium  n  processors  has  illustrated  the  importance  of  verification  of  the  conversion  circuits  [16]  which 
convert  the  data  from  one  format  to  another  format  (e.g.,  IEEE  single  precision  to  double  precision).  These 
circuits  are  another  common  unit  in  floating-point  processors.  For  example,  the  MIPS  processor  supports 
conversions  between  any  of  the  three  number  formats:  integer,  IEEE  single  precision,  and  IEEE  double 
precision. 

We  believe  that  the  verification  of  the  conversion  circuits  is  much  easier  than  the  verification  of  floating-point 
adders,  since  these  circuits  are  much  simple  than  the  floating-point  adders  and  only  have  one  operand(i.e.  less 
variables  than  FP  adders).  For  example,  the  specification  of  the  double-to-single  (D2S)  operation,  which  con¬ 
verts  the  data  from  double  precision  to  single  precision,  can  be  written  as  "(overflow .flag  =  expected.overflow) 

&  (not  overflowjlag  (output  =  expected-output))",  where  overflow Jiag  and  output  are  directly  from  the 
circuit  as  well  as  expectedxjverflow  and  expected. output  are  computed  in  terms  of  the  inputs,  since  some 
of  the  numbers  represented  in  double  precision  cannot  be  represented  in  single  precision.  For  example,  ex- 
pected. output  is  computed  by  Round({-l)^  X  M  X  Similarly,  expected-overflow  can  be  computed 

from  the  inputs. 

For  another  example,  the  specification  of  the  single-to-double  (S2D)  operation,  which  converts  the  data  from 
single  precision  to  double  precision,  can  be  written  as  "output  =  input",  since  every  number  represented  in 
single  precision  can  be  represented  in  double  precision  without  rounding(i.e.  the  output  represents  the  exact 
value  of  input). 

8  Conclusions  and  Future  Work 

We  presented  extensions  to  word-level  SMV  to  enable  the  verification  of  floating-point  adders  with  implementation- 
independent  specifications.  Word-level  SMV  were  improved  by  using  the  Multiplicative  Power  HDD  (*PHDD) 
representation,  by  deriving  efficient  algorithms  for  equality  and  inequality  operations  between  two  *PHDDs, 
and  by  incorporating  conditional  symbolic  simulation  as  well  as  a  short-circuiting  technique.  Based  on  the  case 
analysis  of  the  signs  and  the  relations  of  two  exponents,  the  specifications  of  floating-point  adders  are  divided 
into  several  hundreds  of  implementation-independent  sub-specifications. 

Conditional  forward  simulation  has  the  advantage  of  implementation-independent  specifications.  We  iden¬ 
tified  a  conflict  in  the  variable  orderings  between  the  mantissa  comparator  and  mantissa  adder  which  prevents 
the  use  of  conditional  forward  simulation.  We  presented  a  short-circuiting  technique  to  solve  this  problem. 
This  short-circuiting  technique  can  be  generalized  and  used  in  the  verification  which  only  exercises  part  of  the 
circuits. 

We  used  our  system  and  the  implementation-independent  specifications  to  verily  a  FP  adder  from  University 
of  Michigan.  Our  system  found  several  bugs  in  Huff’s  FP  adder  and  generated  counterexamples  within  several 
minutes.  After  fixing  the  bugs,  a  variant  of  the  Aurora  HI  FP  adder  is  created  by  introducing  a  mantissa 
comparator  and  extra  circuits  for  demonstrating  the  capability  of  our  system  to  handle  different  FP  adder 
designs.  For  each  of  FP  adders,  the  verification  task  finished  in  2  CPU  hours  on  a  Sun  UltraSPARC-II  server  for 
IEEE  double  precision.  The  verified  specifications  covered  99.78%  of  the  entire  input  space.  The  uncovered 
input  space  (0.22%)  We  believe  that  our  system  and  specifications  can  be  applied  to  directly  verify  FP  adders 
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and  to  help  finding  design  errors. 

The  overflow  flag  erratum  of  the  FIST  instruction  (floating-point  to  integer  conversion)  [  1 4]  in  Intel’s  Pentium 
Pro  and  Pentium  n  processors  has  illustrated  the  importance  of  verification  of  the  conversion  circuits  which 
convert  the  data  from  one  format  to  another  format  (e.g.,  IEEE  single  precision  to  double  precision).  Since 
these  circuits  are  much  simpler  than  floating-point  adders  and  only  have  one  input  operand,  we  believe  that  our 
system  can  be  used  to  verify  the  correctness  of  these  circuits.  We  plan  to  verify  the  conversion  circuits  in  the 
Aurora  III  chip. 


Acknowledgement 

We  thank  Prof.  Brown,  Dr.  Huff  and  Mr.  Riepe  at  University  of  Michigan  for  providing  us  with  Huff’s  FP 
adder  and  valuable  discussions.  We  thank  Dr.  John  Zhong  at  SGI  for  helping  us  to  synthesize  the  Huff’s  FP 
adder  design  into  flattened,  gate-level  Verilog.  We  thank  Henry  A.  Rowley  for  providing  us  the  simple  Perl 
script. 

References 

[1]  Aagaard,  M.  D.,  and  Seger,  C.-J.  H.  The  fonnal  verification  of  a  pipelined  double-precision  IEEE 
floating-point  multiplier.  In  Proceedings  of  the  International  Conference  on  Computer-Aided  Design 
(November  1995),  pp.  7-10. 

[2]  Bryant,  R.  E.  Graph-based  algorithms  for  boolean  function  manipulation.  In  IEEE  Transactions  on 

Computers  1986),  pp.  8:677-691. 

[3]  Bryant,  R.  E.,  Beatty,  D.  L.,  and  Seger,  C.-J.  H.  Formal  hardware  verification  by  symbolic  ternary 
trajectory.  In  Proceedings  of  the  28th  ACM/IEEE  Design  Automation  Conference  (June  1991),  pp.  397- 
402. 

[4]  Bryant,  R.  E.,  and  Chen,  Y.-A.  Verification  of  arithmetic  circuits  with  binary  moment  diagrams.  In 
Proceedings  of  the  32nd  ACM/IEEE  Design  Automation  Conference  (June  1995),  pp.  535—541. 

[5]  CARREnO,  V.  A.,  AND  MINER,  R  S.  Specification  of  the  IEEE-854  floating-point  standard  in  HOL  and 
PVS.  In  High  Order  Logic  Theorem  Proving  and  Its  Applications  (September  1995). 

[6]  Chen,  Y.-A.,  and  Bryant,  R.  E.  *PHDD:  An  efficient  graph  representation  for  floating  point  circuit 
verification.  In  Proceedings  of  the  International  Conference  on  Computer-Aided  Design  (November 
1997),  pp.  2-7. 

[7]  Chen,  Y.-A.,  Clarke,  E.  M.,  Ho,  P.-H.,  Hoskote,  Y,  Kam,  T.,  Khaira,  M.,  O’Leary,  J.,  and  Zhao, 
X.  Verification  of  all  circuits  in  a  floating-point  unit  using  word-level  model  checking.  In  Proceedings  of 
the  Formal  Methods  on  Computer-Aided  Design  (November  1996),  pp.  19-33. 

[8]  Clarke,  E.  M.,  Fujita,  M.,  and  Zhao,  X.  Hybrid  decision  diagrams  -  overcoming  the  limitations 
of  MTBDDs  and  BMDs.  In  Proceedings  of  the  International  Conference  on  Computer-Aided  Design 
(November  1995),  pp.  159-163. 

[9]  Clarke,  E.  M.,  Khaira,  M.,  and  Zhao,  X.  Word  level  model  checking  -  Avoiding  the  Pentium  FDIV 
error.  In  Proceedings  of  the  33rd  ACM/IEEE  Design  Automation  Conference  (June  1996),  pp.  645-648. 


19 


[10]  Clarke,  E.  M.,  McMillan,  K.,  Zhao,  X.,  Fujita,  M.,  and  Yang,  J.  Spectral  transforms  for  large 
Boolean  functions  with  applications  to  technology  mapping.  In  Proceedings  of  the  30th  ACM/IEEE 
Design  Automation  Conference  (June  1993),  pp.  54-60. 

[11]  Coe,  T.  Inside  the  Pentium  Fdiv  bug.  Dr.  Dobbs  Journal  (April  1996),  pp.  129-135. 

[12]  COUDERT,  O.,  AND  Madre,  J.  C.  A  unified  framework  for  the  formal  verification  of  sequential  circuits.  In 
Proceedings  of  the  International  Conference  on  Computer-Aided  Design  (November  1990),  pp.  126—129. 

[13]  Drechsler,  R.,  Becker,  B.,  and  Ruppertz,  S.  K*BMDs:  a  new  data  struction  for  verification.  In 
Proceedings  of  European  Design  and  Test  Conference  (March  1996),  pp.  2—8. 

[14]  FISHER,  L.  M.  Flaw  reported  in  new  intel  chip.  New  York  Times  (May  6  1997),  D,  4:3. 

[15]  Hamaguchi,  K.,  Mortta,  a.,  and  Yajima,  S.  Efficient  construction  of  binary  moment  diagrams  for 
verifying  arithmetic  circuits.  In  Proceedings  of  the  International  Conference  on  Computer-Aided  Design 
(November  1995),  pp.  78-82. 

[  1 6]  Huff,  T.  R.  Architectural  and  circuit  issues  for  a  high  clock  rate  floating-point  processor.  PhD  Dissertation 
in  Electrical  Engineering  Department,  University  of  Michigan  (1995). 

[17]  Jain,  P.,  and  Gopalakrishnan,  G.  Efficient  symbolic  simulation-based  verification  using  the  parametric 
form  of  boolean  expressions.  In  IEEE  Transactions  on  Computer-Aided  Design  of  Integrated  Circuits  and 
Systems  (August  1994),  pp.  1005-1015. 

[18]  Leeser,  M.,  and  O’Leary,  J.  Verification  of  a  subtractive  radix-2  square  root  algorithm  and  implemen¬ 
tation.  In  Proceedings  of 1995  IEEE  Intemaational  Conference  on  Computer  Design:  VLSI  in  Computer 
and  Processors  (October  1995),  pp.  526-531. 

[19]  McMillan,  K.  L.  Symbolic  Model  Checking.  Kluwer  Academic  Publishers,  1993. 

[20]  Quach,  N.,  and  Flynn,  M.  Design  and  implementation  of  the  SNAP  floating-point  adder.  Tech.  Rep. 
CSL-TR-9 1-501,  Stanford  University,  December  1991. 

[21]  RueB,  H.,  Shankar,  N.,  and  Srtvas,  M.  K.  Modular  verification  of  SRT  division.  In  Computer-Aided 
Verification,  CAV  '96  (New  Brunswick,  NJ,  July/August  1996),  R.  Alur  and  T.  A.  Henzinger,  Eds., 
no.  1 102  in  Lecture  Notes  in  Computer  Science,  Springer- Verlag,  pp.  123-134. 

[22]  Sharangpani,  H.  R,  and  Barton,  M.  L.  Statistical  analysis  of  floating  point  flag  in  the  pentium 
processor(1994).  Tech,  rep.,  Intel  Corporation,  November  1994. 

[23]  Suzuki,  H.,  Morinaka,  H.,  hiroshi  Making,  Nakase,  Y,  Mashiko,  K.,  and  Sumi,  T.  Leading-zero 
anticipatory  logic  for  high-speed  floating  point  addition.  IEEE  Journal  of  Solid-State  Circuits  (August 
1996),  pp.  1157-1164. 

[24]  Zhao,  X.  Verification  of  arithmetic  circuits.  Tech.  Rep.  CMU-CS-96-149,  School  of  Computer  Science, 
Carnegie  Mellon  University,  1996. 


20 


